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Photon-Plasma: a modern high-order particle-in-cell code 
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0ster Voldgade 5-7, DK-1350 Copenhagen, Denmark 
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We present the Photon-Plasma code, a modern high order charge conserving particle-in-cell code for simulating 
relativistic plasmas. The code is using a high order implicit field solver and a novel high order charge 
conserving interpolation scheme for particle-to-cell interpolation and charge deposition. It includes powerful 
diagnostics tools with on-the-fly particle tracking, synthetic spectra integration, 2D volume slicing, and a 
new method to correctly account for radiative cooling in the simulations. A robust technique for imposing 
(time-dependent) particle and field fluxes on the boundaries is also presented. Using a hybrid OpenMP and 
MPI approach the code scales efficiently from 8 to more than 250.000 cores with almost linear weak scaling 
on a range of architectures. The code is tested with the classical benchmarks particle heating, cold beam 
instability, and two-stream instability. We also present particle-in-cell simulations of the Kelvin-Helmholtz 
instability, and new results on radiative collisionless shocks. 



I. INTRODUCTION 

Particle-In-Cell models have gained widespread use in 
astrophysics as a means to understand plasma dynamics, 
particularly in collisionless plasmas, where non-linear in- 
stabilities can play a crucial role for launching plasma 
waves and accelerating particles. The advent of of tera- 
and now peta-flop computers has made it possible to 
study the macroscopic properties of both relativistic and 
non-relativistic plasmas from first principles in large scale 
2D and 3D models, and sophisticated methods, such as 
the extraction of synthetic spectra is bridging the gap 
between models and observations. 

While Particle-In-Cell codes were some of the first 
codes to be developed for computers 1 ' 2 , and several clas- 
sic books have been written on the subject 3 ' 4 , modern 
numerical methods are in use today in the community, 
which did not exist then, and the temporal and spatial 
scales of the problems have grown enormously. Further- 
more, in the context of astrophysics the modeling of rel- 
ativistic plasmas has become of prime importance. 

In this paper we present the relativistic particle-in-cell 
PhotonPlasma code in use at the University of Copen- 
hagen, and the numerical and technical methods imple- 
mented in the code. The code was initially created dur- 
ing the 'Thinkshop' workshop at Stockholm University 
in 2005, and has since then been under continuous de- 
velopment. It has been used on many architectures from 
SGI, IBM, and SUN shared memory machines to modern 
hybrid Linux GPU clusters. Currently our main plat- 
forms of choice are Blue-Gene and Linux clusters with 
8-16 cores per node and infiniband. We have also devel- 
oped a special GPU version that achieves a 20x speedup 
compared to a single 3GHz Nehalem core (to be presented 
in a forthcoming paper). The code has excellent scaling, 
with more than 80% efficiency on Blue-Gene/P scaling 



weakly from 8 to 262.144 cores, and on Blue-Gene/Q 
from 64 to 524.288 threads. The I/O and diagnostics is 
fully parallelized and on some installations we reach more 
than 45 GB/s read and 8 GB/s write I/O performance. 

In Section II and III we introduce the underlying equa- 
tions of motion, and the numerical techniques for solving 
the equations. We present our formulation of radiative 
cooling in Section IV, and in Section V the various initial 
and boundary conditions supported by the code. Section 
VI presents on-the-fly diagnostics, including the extrac- 
tion of synthetic spectra. Section VII describes the bi- 
nary collision modules, while Section VIII contains test 
problems. In Section IX we discuss aspects of paralleliza- 
tion and scalability, and finally in section X we finish with 
concluding remarks. 



II. EQUATIONS OF MOTION 

The PhotonPlasma code is used to find an approx- 
imate solution to the relativistic Maxwell- Vlasov system 
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where s denotes particle species in the plasma (electrons, 
protons, . . . ), 7 = (1 — (u/c) 2 ) -1 / 2 is the Lorentz factor, 
and C = df s /dt\ cM denotes some collision operator. 

In a completely collisionless plasma C is zero, but in 
the code we also allow for binary collisions between parti- 
cles. As shown below in the tests, discretization effects in 
the interpolation of fields and sources between the mesh 
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and the particles and the integration of the equations of 
motion lead to a non-zero, non-physical, collision term, 
which should be minimized, especially in the case of col- 
lisionlcss plasmas, but also with respect to any collisional 
modeling term introduced explicitly. The charge and cur- 
rent densities are given by taking moments of the distri- 
bution function over momentum space 



p c (x) = / du^2q s f s (x,u) 
J s 

J(x) = J du^2uq s f s (x,u) . 



(6) 
(7) 



To find an approximate representation for this six- 
dimensional system in the particle-in-cell method so- 
called macro particles are introduced to sample phase 
space. Macro particles can either be thought of as La- 
grangian markers that measure the distribution func- 
tion in momentum space at certain positions, or as en- 
sembles of particles that maintain the same momentum 
while moving through the volume. If the trajectory of 
a macro particle is a solution to the Vlasov equation, 
given the linearity, a set of macro particles will also be 
a solution to the system. Other continuum fields, which 
only depend on position, are sampled on a regular mesh. 
Macro particles are characterized by their positions x p 
and proper velocities p p = u-f. They have a weight fac- 
tor w p , giving the number density of physical particles 
inside a macro particle, and a physical shape S. The 
shape is chosen to be a symmetric, positive and sep- 
arable function, with a volume that is normalized to 
1. For example in three dimensions it can be written 
S(x - x p ) = Siu{x - x p )SiT>{y - y P )Sm(z - z p ), and 
/ S(x — x p )dx = 1. The full distribution function of a 
single macro particle is then 

fp{x,p)=w p S(p-p p )S(x-Xp). (8) 

Inserting the above in Eq. 1 and taking moments 3-5 we 
find the equations of motion for a single macro particle, 

-£=u p -^ = ± { Ep + Up*Bp), (9) 

where the electromagnetic fields are sampled at the 
macro particle position through the shape functions 



E p — E 



(x p ) = jdxE{x)S{x-Xp) (10) 
Bp = B(x p ) = jdxB{x) S(x - x p ) . (11) 

The Vlasov equation (Eq. 1) is linear, and if a single 
macro particles obeys Eqs. 9 a collection of macro parti- 
cles, describing the plasma, will also obey it. Using that 
the shape functions are symmetric, and assuming that 
the electromagnetic fields are constant inside each cell 



volume we find 

E p = J2 E(x c )W(x c -x p ) (12) 

jc c =ccll vertices 

B p = B(x c )W(x c -x p ), (13) 



x c — cell vertices 



where the weight function W is given by integrating the 
shape function over the cell volume 



W(x c - x p ) 



dx S(x — x p ) . 



(14) 



In principle any shape function would be valid. How- 
ever, in practice most PIC codes employ shape functions 
belonging to a family of basis functions known as B- 
splines. They have a number of useful properties 6 : 

• The particle interpolation function, a B-spline of 
order O, is O — 1 times differentiablc continuous, 
for O > 1 

• Particle weight functions also become £?-sphnes of 
order O + 1 

• Their support is bounded; their width in number 
of mesh points is equal to their order + 1. 

• The sum on mesh points over the support is 
Support B o = i 7 for > o 

In the PhotonPlasma code one can select particle 
shape functions from one of the four lowest order B- 
splines (see also Fig. 1) giving the weight functions: 



NGP: 'Nearest grid point', 
W°(x) 



o^ = <! l if 1*1 < I 
otherwise 



CIC: 'Cloud-In-Ceir, 



W 1 (x) = 



1-1*1 if|<5|<l 







otherwise 



TSC: 'Triangular Shaped Cloud', 



if |*| < i 

W 2 (x) = {±(l-\6\) 2 if i < |*| < # 
otherwise 

PCS: 'Piecewise Cubic Spline', 



W 3 (x) 



<\ (4-6* 2 + 3|cfJ ifO<|*|<l 
|) 3 ifl<|*|<2 
otherwise 
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FIG. 1. To the left, the first four B-splines, {-B 1 , . . . , B 4 }, which define the particles' shape functions (interpolation kernels). 
These four are all accessible in the code. To the right, the cell (area) weighting of the interpolation kernel for the triangular- 
shaped-cloud. This shape function is the piecewise linear spline, whereas the piecewise quadratic spline shape produces the 
bell-shaped (cubic spline) area weighting function - cf. Eq. 14. 



where W l (x) is the one dimensional weight function, and 
5 = (x c — x p )/Ax. 

Normally, there are many more particles than mesh 
points in a particle-in-cell model. One important bene- 
fit of introducing a higher order particle shape function 
is a reduction in the aliasing effects associated with the 
under-sampling when interpolating particle data on the 
mesh. When employing a higher order field solver, also, 
a higher order particle shape function such that the ef- 
fective width of the particle shape function response and 
the field differencing operator response are 'not too far 
apart'. If the particle shape function has a very low or- 
der — say = — the strong aliasing at high frequency 
may be visible to a high order — say O = 6 — differ- 
ential operator, which will introduce spurious contribu- 
tions from the Maxwell source term(s). One should thus 
take care to keep the order of the particle shape function 
'high', if the field solver difference operators have order 
'high'. On balance, using the highest order cubic spline 
interpolation with 64 mesh points in three dimensions 
to couple the fields and the particles has turned out to 
be effectively the cheapest method in most applications. 
The large support leads to better noise properties, and 
hence a lower amount of particles can be used to reach 
the same quality compared to using more particles and 
a lower order i?-spline. The cost is also partially offset 
by the increased number of FLOPS per memory transfer, 
when compared to a lower order scheme integration cycle 
resource consumption. 

III. DISCRETIZATION AND TIME INTEGRATION 

To solve the equations of motion for the coupled 
particle-field system (Eq. 4, Eq. 5, and Eq. 9) we need 
to choose both a spatial and temporal discretization, and 
use either explicit or implicit spatial derivatives and time 
integration techniques. To optimally exploit the reso- 
lution, and taking into account the symmetries of the 



Maxwell equations, we use a Yee lattice 7 to stagger the 
fields. The charge density is located at cell centers. To 
comply with Gauss law (Eq. 2) the electric fields and 
the current density, both entering with the same spa- 
tial distribution in Ampere's law (Eq. 5), are staggered 
upwards to cell faces, while magnetic fields, to be con- 
sistent with the curl operator in Eq. 5, are placed at 
cell edges. With this distribution (see Fig. 2) the deriva- 
tives in Eqs. 2-5 are automatically calculated at the right 
spatial positions, and no interpolations are needed. Be- 
cause of the spatial staggering the numerical derivatives 
commute, and the magnetic field evolution conserves di- 
vergence to round-off precision. The electric potential 
4>e is located at cell centers, and the magnetic poten- 
tial <j)B is at cell corners. For the time integration we 
stagger the proper velocity of the macro particles wy p 
and the associated current density J backwards time. In 
the PhotonPlasma code the natural order to do the 
different updates in a time step is (see Fig. 3) 

1. Update the proper velocity uj p of the macro par- 
ticles using either the Boris or Vay particle pusher 

2. Calculate the charge density on the mesh p l (x p ), 
update the macro particle positions to x p +1 , and 
recalculate the charge density p i+1 (x p +1 ) 

3. If using a charge conserving method, find the cur- 
rent density from the time derivative of the charge 
density d t p = [p l+1 {x p +1 ) — p l (x p ))/At, else calcu- 
late it directly from the particle flux u p interpo- 
lated on the mesh. 

4. Given the time staggered current density, update 
the magnetic field using an implicit method 

5. Combine the old and new magnetic field values to 
make a time centered update of the electric field 

In between step 3 and 4 first the particle-, charge- and 
current boundary conditions are applied, together with 
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FIG. 2. Spatial staggering in the PhotonPlasma code. 
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tion of the velocity. There exists higher order symplectic 
integrators with several substeps of the general positions 
and momenta composing a full time update. But only 
for a subset of these integrators the position is always at 
the mid-distance in time between the current and new 
momenta; a necessary condition to be able to update the 
particle momenta. 

We have implemented a symmetric fourth order sym- 
plectic integrator 8-10 in the code. Due to its symmetric 
nature, a single, full timestep is performed by simply tak- 
ing four leap frog substeps, making it an almost trivial 
change to implement, with updates needed only in the 
main driver. The method significantly increases the long 
term stability of the evolution for e.g. streaming plasma 
simulations, at the price of increasing the cost of a sin- 
gle time step by a factor of 4. For three dimensional 
simulations, where increasing the resolution by a factor 
of 2 costs a factor of 16 in cpu time this can be very 
worth while, compared to using the second order leap 
frog method with a \[2 higher resolution. 



A. Particle motion 

To move the macro particles forward in time we have to 
solve Eq. 9. Because of the time staggering it is straight 
forward to make an 0(At 2 ) precise position update 



FIG. 3. Time staggering and integration order in the Pho- 
tonPlasma code. wy p and J are staggered At/2 backwards 
in time, while everything else is time centered. 

an exchange of particles between MPI threads. Then the 
particle array, local to each thread, is sorted sequentially 
according to the cell position. The sorting assures op- 
timal cache locality when computing the charge density, 
and step 1 to 3 can be executed in one sweep, and us- 
ing small cache friendly buffer arrays for accumulating 
p c and dtp c or J. The boundary conditions for the mag- 
netic and electric fields are applied while solving their 
time evolution in step 4 and 5. The different steps are 
described in more detail below. 

The time update shown above corresponds to a sec- 
ond order Leap Frog method. Compared to for example 
Runge-Kutta methods, or other methods where all vari- 
ables are time centered, the advantage is that it requires 
zero extra storage for intermediate steps, and the par- 
ticle update is symplectic, giving stable particle orbits. 
In general, symplectic integrators use a pair of conjugate 
variables. In this case they are the position, electromag- 
netic fields and charge density and the particle proper 
velocities and current densities. As detailed below, the 
leap frog update is only possible because we evaluate the 
ux B term in the Lorentz force using an implicit evalua- 



x p +1 = x p + Atu p +1 , (15) 

where the upper index i indicates the iteration number. 
The position x p is evaluated at t l , while the proper ve- 
locity ujp is staggered backwards in time and evaluated 
at f-Va = t l - At/2. 

The proper velocity update is a bit more delicate. To 
find a time centered Lorentz force we need a time cen- 
tered value for the velocity u p . This gives an implicit 
equation for uj p +1 , and traditionally the Boris particle 
pusher 11 has been used. It is formulated so that first the 
time centered proper velocity is computed as the aver- 
age U7 P (i I ) = («7p + uj p +1 )/2, and from that the nor- 
mal velocity u p (t l ) is derived. It is still an option in 
the code, but Vay has showed 12 that the Boris pusher 
is not Lorentz invariant, and gives incorrect solutions in 
simple relativistic test cases. Instead, the Vay particle 
pusher calculates tx p (i l ) = {u l p + u p +1 )/2 directly, as the 
average of the two three velocities. Even though the re- 
sulting equations for u~f' p +1 involve the root of a fourth 
order polynomial, there is an analytic solution, and the 
end result is a Lorentz invariant proper velocity update. 
Therefore, the Vay particle pusher is now the standard 
option for updating the proper velocity. It is 0(Ai 2 ) 
precise. 
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B. Solving Maxwell's Equations 

In the PhotonPlasma code Maxwell's equations are 
solved by means of an implicit scheme for evolving the 
magnetic field, B l — > B i+1 , followed by an explicit up- 
date of the electric field, E l — > E l+1 . De-centering of 
the integrator may be employed, such that the implicit 
magnetic field term, B l+1 , is weighted higher than the 
explicit term, B l , by setting a parameter a, at run ini- 
tialization. 

With this de-centering of the implicit averaging of B 
taken at two consecutive time steps, the discretized forms 
of Faraday's and Ampere's Laws read 



B i+i _ B i 



At 



and 



E i+i _ E i 
Af 



= -V+ x aE 



V" x (aB l 



i+l 



■/3E*) 



(16) 



- /3B*) - /i J 



i+l 



(17) 

where, a and /3 = 1 — a with a > 1/2, quantifies the for- 
ward de-centering of the implicit magnetic field term and 
the high frequency wave damping strength and spectral 
width that accompanies the choice. The hat and sign 
denotes that it is the discrete version of the differential 
operator and that it is applied downwards or upwards, i 
indicates the iteration. The current is downstaggered in 
time, and the i + l iteration is already calculated from 
the macro particle distribution, when starting to solve 
the Maxwell equations. 

Isolating E i+1 in Eq. 17 and taking the curl, we can 
insert it in to Eq. 16. Using the vector identity V x V x 
B = V (V • B) V 2 .B, which also holds for the staggered 
discretized operators, and using V • B = 0, produces an 
elliptic equation for B l+1 



(1 



a^ArV^)B l+1 =B l + c'apArVB 
- At\7+ x E i 



c 2 a/z ArV+ x J 



i + l 



(18) 



The right hand side of Eq. 18 contains only known terms 
(at time U for E l and B\ and U + At/2 for J l+1 ), but 
the operator on the left hand side is elliptic, complicating 
a direct solve. We have implemented a simple iterative 
solver taking B l+1 — B % as a first guess, with a solu- 
tion found by successive relaxation. Elliptic equations 
are non-local and our solver requires repeated updates of 
boundaries and ghost-zones. However, the limitation to 
the parallel scalability is not serious, in that convergence 
is normally reached in 1-10 iterations for most simulation 
setups, with tolerances on the residual error of about 
10~ 6 . Once the relaxed solution has been found, and 
provided that the initial simulation setup had V • B = 0, 
going forward V • B = is guaranteed, due the constraint 
(V • B = 0) being implicitely built into the derivation of 
Eq. 18. Having found B t+1 , the electric field is simply 
updated explicitly by Eq. 17. 



%E z ^d t Bx j+1/2 



Ezj.2 Ezj.j Ezj Ez j+1 Ez j+2 Ez j 



'J+3 



FIG. 4. Example of the sixth order difference operation Vx in 
ID (along the y-axis). Due to the Yee mesh staggered layout 
of variables, the central difference is computed exactly where 
needed. The differential operator called 'ddyup' (in the code) 
produces the derivative w.r.t. the Y-axis, up-shifted one half 
mesh point on the axis. For the example in the figure this 
produces the correct derivative of B y at the desired mesh 
point location, j/j +1 / 2 . In the nomenclature adopted here, 
this operator is denoted 6 dy ■ Cell centers are marked in blue, 
and cell edges in red. Further, compare this figure with the 
respective components in Fig. 2. 



The field integrator is unconditionally stable for 
1/2+ < a < 1~. However, the scheme tends to damp 
out high-frequency waves due to the de-centered implicit 
nature of the scheme, and the solver is only second or- 
der accurate for values a ~ l/2 + . Besides providing 
a tunable stabilization of the field integration scheme, 
this parameter also determines how large a time step can 
be chosen, and the damping of high frequency waves. 
Empirically, with our highest order scheme (6 th order 
fields, PCS particle assignment), for values a > 0.525 
the scheme is numerically stable. 

Forming spatial derivatives of the field quantities is 
done by finite differencing on a uniform (but not necessar- 
ily isotropic), mesh. A set of operators identical to those 
used in the Stagger code - also developed and main- 
tained in Copenhagen 13 - are implemented, with a choice 
between 2 nd , 4 th and 6 th order accuracy in space. Stag- 
gering of the variables on a Yee lattice 7 leads to highly 
simplified computations for the difference equations. For 
example, for Faraday's Law (Eq. 16) the rr-component, 



namely [dtB] 



= d y E z 



-d z E y , along the y — axis reduces 



to dtB x — d y E z . From Fig. 2 we see that this com- 
putation yields the desired value exactly where needed, 
provided that we compute the central differences at the 
half-staggered mesh point; a single component of the Vx 
operator is illustrated in Fig. 4. 

In the PhotonPlasma code, for the 6 th order accu- 
rate finite difference first derivative with respect to the 
y-axis the expression reads 



6 5+ f ( U+i fj 

"■'" J:2 " { Ay 

' fj+2 - I j -I 



" ' ' 2 " V Ay 
b / f ' 



Ay 

fj+3 ~ fj-2 

Ay 



(19) 



with coefficients a = 25/21, b = -25/384 and c = 3/640, 
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matching those of the Stagger code. For our example 
of Faraday's Law above, the corresponding expression to 
compute becomes 



3 d+E z (y J+1/2 ) = 



25 { E z (y j+1 ) - E z (y 3 ) 



21 V Ay 

-25 ( E z (y j+2 ) - E z { yj --s) 



384 V Ay 

3 ( E z {y 3+3 ) - E z (y 3 _ 2 ) 



640 



Ay 



(20) 



for that component along the y-direction. 

This choice of coefficients (based on a Taylor expan- 
sion) for the higher order differential operators has en- 
abled us to consistently port simulation data from the 
Stagger code to the PhotonPlasma code, for cou- 
pling of MHD simulations and PIC simulations. 



C. Charge conserving current density 

Assuming the charge and current density on the mesh 
to be volume averaged, just like the fields, and inserting 
the phase space density of a set of macro particle (Eq. 
8) into Eq. 6 for the charge density, we find the charge 
density in a cell as 



Pc(x c ) = ^2q p w p W(x p 

particles 



(21) 



where q p and w p are the charge and number density of the 
macro particle. Another reason for choosing volume av- 
eraging, just like with the electromagnetic fields, is that 
the computation of the fields at the particle positions and 
the charge density on the mesh has to use the same inter- 
polation technique, or particles can induce self- forces 3 ' . 
In principle, one could use a similar definition for the 
current density 



J c (x c ) = ^q P w p u p W(x p - x c ) , 

particles 



(22) 



but for the discrctized version of Gauss law (Eq. 2) to 
hold true the correspondingly discretized charge conser- 
vation law, 



dtpc + V • J = , 



(23) 



has to be satisfied. In general, with the above defini- 
tions for the charge and current densities, this will not 
be the case. While it holds for particles moving inside a 
cell, particles that move through a cell boundary in a sin- 
gle time step can violate charge conservation. Methods 
have been developed 14-17 for second order field solvers 
that instead of using Eq. 22 to compute J use differ- 
ent schemes to directly find J via Eq. 23. In particular 
Esirkepov showed 16 how to make a unique linear decom- 
position of the change in charge density d t p c for arbitrary 



shape functions into different spatial directions, and de- 
couple Eq. 23 into a set of differential equations, each 
involving only one component of the current density 



<9 4 Ji = dtpc 



(24) 



The directional time derivatives of the charge density Dpi 
are computed on a macro particle basis, as described by 
Esirkepov 16 , and for a second order differential operator, 



l d7J,. 



Ax, 



(25) 



it is straight forward to compute Ji from a single par- 
ticle with a simple prefix sum, assuming that the con- 
tribution of a particle to the current density has com- 
pact support on the mesh and far away is zero. In the 
PhotonPlasma code the current density is found us- 
ing this method when a second order field solver is in 
use. Note that all PIC codes based directly on the pub- 
lished charge conserving methods 14-17 work with second 
order field solvers only, since the order of the discretized 
differential operators for Gauss law and charge conser- 
vation has to be the same. The Esirkepov method has 
a very concise formulation and is well suited to general- 
ize to higher order field solvers, but it is not clear how 
to couple the methods by Eastwood 14 , Villasenor and 
Buncman 15 , or Umeda 17 to higher order field solvers. 

For higher order field solvers it is more complicated 
to solve Eq. 24. For example a sixth order differential 
operator involves sixth mesh points 



b 97Ji 



Ji —)+KJt-J~) + < J i~ J l )1 ( 26 ) 



and the simple prefix sum that can be used at second 
order turns into a linear set of equations. It is also not 
clear what the boundary conditions, even for a single 
particle with compact support, should be. In the simplest 
case, for a single particle, the matrix will look something 
like 



-b 



b 

a J 



( J t \ 



\jf + J 



( 



\ 



Vpf 








3+ 



(27) 



/ 



where we for simplicity have absorbed Axi in the defini- 
tion of a, b, and c. There are several problems with using 
the above set of equations: i) it is not clear how the "no 
current far away" condition is implemented, and because 
of the few points involved the cutoff will have some im- 
pact on the result ii) the equation for a single macro par- 
ticle with cubic interpolation easily involves a 10 x 10 ma- 
trix, and iii) as formulated above, the problem is highly 
unstable, because the coefficients are alternating and of 
very different size (a = 25/21, b = -25/384, c = 3/640). 
Furthermore to solve the above system of equations for 
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three directions and every single macro particle is both 
very costly, and will introduce noticeable round-off error 
on the mesh, when all contributions from all particles 
are summed up. In the PhotonPlasma code we have 
developed a fast and stable alternative, similar in struc- 
ture to the one published recently 18 for the Aladyn code. 
Instead of solving the current density for each macro par- 
ticle, taking advantage of the linearity of the problem, we 
sum up Dpi directly on the mesh. Then Eq. 26 can be 
solved on the whole domain. There is still the problem of 
the alternating coefficients, but that can be dealt with, 
by first solving for the difference AJi = Jf — Jf, and then 
using a prefix sum, just like in the second order method, 
to find Ji(xi). In terms of the differences the differential 
operator becomes 

6 arj^5(AJ+++AJ--)+6(AJ++Ajr)+aAjO , (28) 

where the coefficients are a = (a + b + c)/Axi, b = 
(b + c)/Axi, c = c/Axi, and the corresponding linear 
system, which now has to be solved on all of the mesh, 
is a symmetric penta-diagonal system 



fa 
b 







a o 

b c 
a b c 







: o 

. . . c b a b c 

... c b a b 

\0 ... a b a J 



V AJ" j 



f v p\ \ 
I Vp\ \ 



\ Vp? I 



(29) 



Pentadiagonal systems have stable and efficient solvers 19 , 
and it costs a negligible amount of cpu time compared to 
the rest of the code to find the current density on the 
mesh, J, from the decomposed time derivative in the 
charge density, Tip. Furthermore, because the evalua- 
tion is done directly on the mesh, the accumulated er- 
rors in the charge conservation are smaller than if the 
current density is computed on a macro particle basis. 
To avoid having to specify boundary conditions for T>p 
and introduce new terms into the matrix in Eq. 29 we 
use an extra virtual cell layer. Before a timestep, by 
definition, the charge density will always be zero in the 
virtual cell layer, and the current density can then be 
computed correctly with a "zero-current" ansatz for the 
current density on the lower boundary. Afterwards the 
normal boundary conditions for the current density can 
be applied, just like when using Eq. 22 for computing the 
current density. This method also completely decouples 
the solution, when running the code with multiple MPI 
threads. If instead of a sixth order field solver a fourth 
order field solver is used the system Eq. 29 becomes tridi- 
agonal. 

In the original version of the PhotonPlasma code 
Eq. 22 is used to compute the current density. The er- 
ror introduced into Gauss law (Eq. 2) is mostly on the 
Nyquist scale, and a simple iterative Gauss- Seidel filter- 
ing technique is used to correct E. The module also com- 
putes the error, and has been used to validate the charge 
conserving methods. In most applications, the relative 



error with the old method can be kept on a 10~ 4 -1CP 5 
level by running the filter 5 to 10 times per iteration, 
but there is no unique way to correct the electric field, 
and the divergence cleaning introduces tiny electric fluc- 
tuations, which couple back to the particles through the 
Lorentz force. Apart from the higher cost of the elliptic 
filter when running on many cores, the method is worse 
at conserving energy and has a larger numerical heat- 
ing rate for cold plasma beams than a charge conserving 
method, and without a current filter it can be unstable. 
When running the code with charge conservation and in 
single precision at high resolution (e.g. ^lOOO 3 cells and 
10-100 billion particles), over time the numerical round- 
off noise will eventually build up errors both in Gauss' 
law and in the solenoidal nature of the magnetic field (Eq. 
3). A single filtering step on the electric and magnetic 
fields every ^5 th iteration is enough to keep the relative 
errors at the 10~ 5 level. When running the code in dou- 
ble precision we have not seen any need to use divergence 
cleaning, and if the initial and boundary conditions obey 
Eqs. 2 and 3, then the relative error typically stays below 

~io- 7 . 



IV. RADIATIVE COOLING 

Very high energy electrons loose momentum by emit- 
ting radiation. The emission in itself is a very valuable 
diagnostic and the extraction of the spectrum is treated 
below, but the energy loss for the high energy particles is 
normally not computed in particle-in-cell codes. Build- 
ing on the work on radiative losses done by Hededal 20 
in our old PIC code, we have developed a numerically 
stable method for correctly calculating radiative losses 
in the PhotonPlasma code. The radiated power from 
a single particle P ra d can be written as 20 
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(30) 



where dot denotes the time derivative. 

Denoting the proper velocity p = u-f, and using the 
identities p ■ p = 7 4 it • u, and p = 711 + c~ 2 -f 3 u(u ■ u), 
we can rewrite Eq. 30 to 
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(31) 



Notice how Eq. 31 only involves proper velocities, and is 
numerically stable at both high and low Lorentz factors, 
while the first version relies on a cancellation between 
7 2 and p 2 , and Eq. 30 uses three velocities, prone to 
numerical errors. 

The radiative cooling always acts in the opposite di- 
rection to the proper velocity vector, and therefore the 
change in the length of the proper velocity vector is di- 
rectly related to the change in the kinetic energy and 
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A. Particle Injection 
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(32) 



where we have used that pp = c 2 -fj, and mc 2 (j— 1) = 

-frad • 

In the PhotonPlasma code the Boris or the Vay 
pusher is used to advance the particles. To integrate the 
effect of radiative cooling, for simplicity and to keep the 
scheme explicit, we assume that the cooling in a single 
timestep only changes the energy with a minor amount. 
The particle pushers advances the four velocities from 
time step t — At/ 2 to t + At/2, while accelerations are 
computed time centered at t. Below we denote the three 
times with — , +, and respectively. 

To get a time centered cooling rate p , po, and 70 are 
needed. The proper acceleration po is already naturally 
time centered, but with the Vay Pusher, for consistency 
one should use the time centered three velocity to com- 
pute wwo and 70 , which however is numerically imprecise 
when using single precision. To get a more numerically 
precise, albeit ever so slightly inconsistent, measure for 
po and 70 we use the time averaged proper velocities 



Po = 



P++P- 



7o 



-4 



(33) 



These values can then be plugged into Eqs. 31 and 32 to 
find the change in momentum due to radiative cooling as 



P+ = P + PoAt 

= P + (pEM +prad)At 
= P+ M + PradAi . 



(34) 



In principle one could find the converged solution to the 
above non-linear equation system, under the assumption 
that cooling is a small correction, by iterating the ra- 
diative cooling computation a couple of times, while up- 
dating p + and po- But in practice we have found that 
the direct explicit calculation of the cooling rate done by 
making no iterations is acceptable. 



V. INITIAL AND BOUNDARY CONDITIONS 

Setting up consistent boundary and initial conditions 
for a particle-in-cell code can be non-trivial, due to the 
mixed particle and cell nature and the staggering in space 
and time. In the PhotonPlasma code the initial and 
boundary conditions are supported through the loading 
of different boundary and initial condition modules, and 
over time a number of modules have been developed. 



Macro particles in a particle-in-cell code sample phase 
space and are by construction placed using a random gen- 
erator. Any call to a random generator is related to the 
position in the mesh, where the pseudo-random number 
is needed. To make the random number generation scal- 
able, but independent of the parallclization technique, we 
have implemented two types of random generators: i) We 
have developed a multi stream variant of the Mersenne 
Twister random generator 21 to generate high quality ran- 
dom numbers, and setup one stream per xy-slice of cells. 
This generator is used in e.g. shock simulations where 
the particle average density and velocity is a function of 
a single coordinate, ii) In the case of more complicated 
setups, e.g. when using snapshots from MHD simulations 
as described below, a simpler random generator is used, 
where the state is contained in a single 32-bit integer, 
but each mesh point has a its own state. The Mersenne 
Twister generator is used to generate the initial seed in 
each cell for the simple random generator. To sample 
the velocity phase-space we have implemented cumula- 
tive 2D and 3D relativistic Maxwell distributions 22 using 
an inverted lookup table 3 . It is important to use the 
correct dimensionality when initializing the velocities, or 
the corresponding 2D or 3D temperature will be incor- 
rect. The particle positions are correspondingly injected 
either uniformly in an xy-slice or in a single cell. Notice 
that using an injection method in a full xy-s\\ce allow for 
larger density fluctuations inside the cells, while if using 
a cell-by-cell injection method there will only be fluctu- 
ations on the sub-cell level. Depending on the physical 
model on hand one or the other method may be more de- 
sirable. We typically use a standard technique to inject 
particles in pairs of different charge type to avoid having 
free charges initially. But the code is flexible, and has 
a built in module to correct Gauss law. Therefore it is 
also possible to inject particles completely at random and 
correct the electric field to include the non trivial effect 
of electrostatic fluctuations from the initial non-neutral 
charge distribution. 



B. Boundary conditions 

Wc handle periodic boundaries trivially by padding the 
domain with ghostzones, a technique also used at the 
edge of MPI domains, and copying fields from the top 
of the domain to the bottom and vice versa while up- 
dating boundaries. The particles, on the other hand, are 
simply allowed to stream freely and the position is in all 
but a few cases calculated modulo the domain size. To 
maintain uniform numerical precision even for very large 
domains the position is decomposed internally as an in- 
teger cell number, and a floating point number giving the 
fractional position inside the cell. 

Reflective boundaries are implemented using "virtual 
particles" on the other side of the reflecting boundary, 
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taking in to account the symmetries of the Maxwell equa- 
tions and the staggering of the mesh. By convention the 
boundary is placed at the center of the cell (i.e. the 
charge density is on the boundary), below for simplicity 
taken to be an upper boundary. When the charge and 
current densities on the mesh are calculated for each par- 
ticle inside the boundary a corresponding virtual "ghost 
particle" outside the boundary has to be accounted for. 
Particles close to the reflecting boundary will contribute 
to the charge and current densities outside the boundary, 
while the virtual particles will contribute a correspond- 
ing charge and current density inside the boundary. This 
can most efficiently be calculated by disregarding the 
boundary at first after a particle update, and calculate 
the charge and current density as if the particles were 
streaming freely. The contribution to the charge and 
current densities inside the boundary from the virtual 
particles can then be calculated by taking into account 
the symmetry. It is easily seen that this corresponds, 
up to a sign, to the contribution of the normal particles 
outside the boundary 

p c (x h - Sx) = p c (x b - Sx) + p c (x b + 5x) (35) 
J||(a;b - Sx) = J\\(x h - Sx) + J\\(x b + Sx) (36) 
J±(x b — Sx) = J±(x b — Sx) — J±(x b + Sx) , (37) 

where Xh is the position of the boundary, Sx is the dis- 
tance; i.e. integer Ax for centered quantities, and half in- 
teger for staggered, || indicates components parallel and 
_L perpendicular to the boundary. The perpendicular 
component of the current density is staggered and anti- 
symmetric across the boundary. Only after calculating 
the current density on the mesh are all particles outside 
the boundary reflected according to 

x — > 2 Xb — x «7|| — > «7|| v-f± — > — vj± . (38) 

When the charge and current densities are correctly cal- 
culated inside and on the boundary we can start consid- 
ering the values outside. The staggered fields, i.e. the 
perpendicular current density and electric field, the par- 
allel magnetic field, and the magnetic potential, may be 
shown to be antisymmetric across the boundary 



J±(xb + Sx) = 


-J±(xb - 


- Sx) 


(39) 
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(42) 



Conversely, the centered fields (in the direction of the 
boundary), charge density, parallel current densities, 
electric fields, the electric potential, and the perpendicu- 
lar component of the magnetic field are symmetric across 



the boundary 
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(47) 



and we only need to determine the values of the centered 
fields at the boundary. The charge and current densities 
are derived from the particle distribution and are there- 
fore already given at all points, including the boundary. 
B±(xb) is the only unknown component of the magnetic 
field, and it can be computed from the solenoidal con- 
straint V+ ■ B = calculated at the boundary. Given 
that the magnetic field is the first to be evolved forward in 
time it is then possible to self consistently calculate the 
parallel electric field on the boundary E\\(x b ), directly 
from the evolution equation. 

Outflow boundaries are less constrained than reflecting 
boundaries, given the extrapolating nature, and various 
types of damping layers and extrapolations have been 
considered 23-25 . We use a damping layer of mesh points 
- typically 20 — to damp all perpendicular compo- 
nents of the electromagnetic fields, effectively absorbing 
reflected waves, and combine it with an extrapolating 
boundary condition (i.e. symmetric first derivative). We 
allow particles to still generate current and charge den- 
sities on the mesh inside the box until they are well out- 
side the boundary. As long as the disturbances in the 
outgoing flow are small this works well and is stable for 
extended run times. 



C. Sliding window and injection of particles 

In highly relativistic flows it can be advantageous to 
simulate the plasma in a frame where the region of inter- 
est is relativistic. This constrains the time such a region 
can be followed, because the computational domain has 
to be continuously expanding or has to have an enormous 
aspect ratio in the flow direction. This is the case for ex- 
ample for relativistic collisionless shocks. An alternative 
to this is to use a sliding window as the computational 
domain centered on the region of interest and moving 
with the same velocity maintaining it at the center of 
the box 26 . We have implemented this technique in the 
PhotonPlasma code, together with a moveable open 
boundary and particle injector. If the window is moving 
with a velocity v relative to the lab frame then in each 
iteration we check if v(U — i id) > Ax and in that case 
we move the box a full mesh point, if necessary. I.e. if 
v = 0.5c and At — 0.5Ax/c the code will roughly move 
the window one point in every 4 iterations. The move is 
implemented by removing one cell at one end of the box, 
translating everything one point, and injecting an extra 
layer of inflow particles at the other end. This technique 
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was used to successfully capture the long term evolution 
of an 3D ion-clcctron collisionless shock 27 . 



D. Embedded particle-in-cell models 

MHD models have successfully been applied for many 
years to study the large scale structure of plasmas from 
laboratory length scales to the largest scales in the uni- 
verse. MHD can reach over such enormous scales, be- 
cause a statistical description of the plasma is employed, 
where the microscopic state is captured using statistical 
quantities like the temperature, viscosity and resistivity. 
On the other hand, the kinetic description of a plasma 
used in a particle-in-cell code is ideally suited to investi- 
gate non-thermal processes, such as particle acceleration 
and particle- wave interactions, and can be used to model 
and understand a much broader range of plasma instabili- 
ties than MHD. The drawback is that exactly because the 
plasma is described in kinetic terms, explicit particle-in- 
cell codes, like the PhotonPlasma code, have to respect 
microscopic constraints and resolve the Debye length, the 
plasma skin depth, and the light crossing time of a single 
cell. Recently, we have developed a technique to couple 
the two approaches using the results from MHD simula- 
tions to supply initial and boundary conditions for our 
PIC code. This has enabled us to for the first time make 
a realistic particle-in-cell description of active coronal 
regions 28,29 , and investigate the mechanism that acceler- 
ates particles in the solar corona. The MHD code is used 
to evolve the global plasma over several solar hours, and 
a snapshot just before e.g. a major reconnection event is 
used to study a small time sequence, of the order of tens 
of solar seconds, using the PIC code. 

Given an MHD snapshot, typically a smaller cutout 
from a larger simulation of the region of interest, we first 
interpolate to the resolution that is to be used in the 
PIC code. The interpolated magnetic fields are corrected 
with a divergence cleaner that uses the same numerical 
derivative operator that are used in the PIC code, to 
assure that the initial magnetic fields are solenoidal to 
roundoff precision. In a given cell the density in the MHD 
snapshot is used to set the weight of individual particles. 
The particles are placed randomly inside the cell, but 
in pairs, so that initially there are no free charges. The 
velocity of the particles have three contributions, 

«7 = «7bulk + ^thermal + «7curront ■ (48) 

The bulk momentum is taken from the MHD snapshot. 
The thermal velocity is sampled from a Maxwell dis- 
tribution using the MHD temperature, and finally the 
current speed is found from the ideal MHD current 
fia J = [i J2i ^^currcnt = V x £?. The average mo- 
mentum has to correspond to the bulk momentum in the 
MHD snapshot. Taking into account the mass ratio, then 
for example in a neutral two component proton-electron 



plasma, with n = pmhd/(™c + m p ), the weighting is 

g TYlp ip if%£ _ _ 

Vcu " cnt = ~7^Wp x Vcu " cnt = Mp x ' 

(49) 

and in general the correct way is to use harmonic weight- 
ing. Finally, an initial condition has to be specified for 
the electric field E. One possibility is E = 0. It satis- 
fies Gauss law - the plasma is neutral initially - but is 
inconsistent with the EMF from the MHD equations. If 
using this initial condition, in the beginning of the run 
a powerful small scale electromagnetic wave is launched 
throughout the box, when the dtE term in Ampere's 
law adjusts the electric field on a plasma oscillation time 
scale. Another possibility is to set E = —u x B, in ac- 
cordance with the ideal MHD equations. Then there is 
no guarantee that Gauss law is satisfied. In the code the 
second choice is used, but small scale features in the elec- 
tric field are corrected by running the build-in Gauss law 
divergence cleaner for a few iterations. The remaining 
difference is adjusted by changing slightly the ion- and 
electron-density, making the plasma charged. Typically 
this only leads to small scale changes. The resulting ini- 
tial electric field then both satisfies Gauss law, and is 
almost in accordance with the MHD EMF. Apart from 
using the MHD EMF, we have also options to add the 
Hall and Battery effect terms. 

To evolve the model, boundary conditions on all six 
boundaries are needed. For the plasma they are con- 
structed exactly like the initial conditions. They can 
easily be made time dependent, by loading several MHD 
snapshots and interpolating in time. In every time step, 
when applying the boundary conditions, first all particles 
in the boundary zones are removed - also particles that 
have crossed the boundary from the interior of the box - 
and are then replaced with a fresh plasma, according to 
the MHD snapshot. The boundary is typically 3 zones 
broad; enough to allow for the sixth order differential op- 
erators on the interior of the mesh, and enough to make 
a well defined charge and current density with the cubic 
spline interpolation. This plasma is retained when evolv- 
ing, and particles from the boundary zones are allowed to 
cross into the interior of the computational domain. By 
maintaining a correct thermal distribution in the bound- 
ary zones, and simply letting the dynamics decide which 
particles stream into the box, the inflow maintains a per- 
fect Maxwell distribution, and the resulting plasma is 
practically identical to what would have been obtained 
with the open boundary method of Birdsall et al 3 , but 
is much simpler to implement, and correctly accounts for 
bulk velocities and currents in and out of the box. If there 
is a differential between the charge or electric current in 
the boundary. Inside the domain the d t E term adjusts 
the plasma almost instantaneously, and the balance is 
maintained. This boundary condition is very similar to 
a perfect thermal bath, but with in- and out-going bulk 
velocities and currents. 

We do not keep the fields fixed at the boundary con- 
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dition, but instead let them evolve freely, only subject to 
reasonable symmetry conditions at the boundary, which 
keep them consistent with the Maxwell equations. To 
respect the symmetry of the equations we let 

• E± symmetric, E\\ antisymmetric 

• d±B± symmetric, By symmetric 

• p c and J specified according to MHD snapshot 

• (j)E symmetric, (f>B antisymmetric, 

where || are the components parallel with the bound- 
ary and _L the component perpendicular to the bound- 
ary. For the relatively short times that we have evolved 
imbedded PIC simulations 28,29 these boundaries are sta- 
ble. 

A severe limitation for coupling PIC and MHD codes 
is that in many situations the Debye length, plasma fre- 
quency and other microphysical length and time scales 
are many orders of magnitude smaller than the scales 
of interest. For example, in the solar corona the De- 
bye length is measured in millimeters, while interesting 
macroscopic scales are measured in megameters. If we 
were to simulate the true system using an explicit PIC 
code we would need roughly (10 8 ) 3 cells, which is compu- 
tationally unfeasible in the foreseeable future. To circum- 
vent this problem we have developed a novel method, in 
which we rescale the physical units while maintaining the 
hierarchy of time, length and velocity scales. The Pho- 
tonPlasma code is flexible and can be employed with 
a range of different unit systems. Furthermore all natu- 
ral constants are maintained in the code. The rescaling 
technique is discussed in detail in Baumann et al 28 . 



VI. DIAGNOSTICS 

A. Particle tracking and field slicing 

Particle-in-Cell simulations are routinely run with bil- 
lions of particles, with each particle taking up ^50 bytes, 
and for 10 3 -10 6 iterations. To store the full data set from 
every single iteration would take up petabytes of storage, 
and is impracticable. Instead, a standard practice when 
running PIC simulations is to only store every n th par- 
ticle in a snapshot, and dump snapshots with a reduced 
frequency, decreasing the data volume dramatically. But 
to understand the underlying physics of for example par- 
ticle acceleration in detail a more fine-grained approach is 
warranted. To that end we have implemented dumping of 
field slices and tracking of individual particles. Any par- 
ticle in the code can be tagged for particle tracking, ac- 
cording to a number of criteria. For example based on its 
energy, at random, or according to the specific ID of the 
particle, which is reproducible between runs. The tagged 
particles are harvested by each MPI thread individually, 
and together with the position and momentum the lo- 
cal values of the current, density, electric and magnetic 



field are recorded, by interpolation from the mesh to the 
particle position. Everything is arranged in a single ar- 
ray that is sent to the master thread. The master thread 
then dumps the particle records to a single file, appended 
to in each iteration. On x86 clusters we can sustain trac- 
ing a million particles without significant performance 
degradation, while on clusters with weaker CPUs, such 
as BlueGene/P, we are limited to <~ 10 5 particles. To 
put the particle tracks into context, data on of the field 
evolution is also needed. To save time-resolved field data 
we have made a field slicing module, where a large selec- 
tion of fields (e.g. the electric field E, E B, J x B etc) 
may be stored as 2D slices. The extent of a slice, and 
the number of field layers in the perpendicular direction 
to the slice, used for averaging, is user selectable. These 
two techniques have been used in concert, to understand 
the mechanism behind particle acceleration in reconnec- 
tion events in the solar corona 28 ' 29 , and in collisionlcss 
shocks 27 . Both diagnostics a interactively steerable: pa- 
rameters can be changed, and diagnostics can be turned 
on and off while a simulation is running. 



B. Synthetic Spectra 

The radiation signature from an large number of of ac- 
celerated charges is not easily computed analytically from 
first principle, for plasmas with complex fields topologies, 
rich phase space structure, and temporal evolution. Ap- 
plication examples include relativistic outflows and col- 
lisionlcss shocks; more specifically, for example, gamma- 
ray bursts, where magncto-bremsstrahlung is very likely 
to constitute a major part of the observational signal. 

However, since particle- in-cell codes automatically pro- 
vide all variables needed for producing a radiation spec- 
trum, namely r (position), f3 (v/c, velocity), and (3 (v/c, 
acceleration), a radiation spectral synthesizer has been 
integrated into the PhotonPlasma code. We need only 
designate observer position(s) and match the frequency 
range to the plasma conditions to complete the setup 
for the computing the radiation integral (Eq. 50 below). 
During run-time the synthesizer computes the radiation 
signature for an ensemble of charged particles (in most 
cases electrons) in the simulation volume,; the formula 
for the spectrum is given by 
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(50) 



with t' the retarded time and n the direction of the ob- 
server. A first comprehensive and thorough study of the 
spectral synthesis method is given by Hededal 20 , which 
also covers a range of test examples. While in that study, 
the spectral synthesis was done as post-production, in 
the PhotonPlasma code all parts of the integration 



12 



are done at run-time, with very little overhead, even for 
large numbers of particle traces 30 ' 31 . 

The discretization of Eq. 50 is done in four parts: 

1. Frequency range is specified as an interval and is 
discretized into Nu, bins, typically of order 10 3 , ei- 
ther with linear or (more often in practice) with 
logarithmic binning. 

2. Observer positions, often more than one (N b s > 1), 
are specified at run initiation time (input), typically 
with directionality perpendicular to a sphere cen- 
tered on the simulation volume, or any important 
direction. 

3. Time subsampling may be chosen; this partitions 
the integration for every simulation time step into 
a number of subcycled integration intervals: Subcy- 
cling is employed on particles selected for synthetic 
tracing since, for highly relativistic situations, the 
retarded electric field can be extremely compressed 
in spikes (for example in the case of synchrotron 
motion with j(v) <C 1). In such situations the sub- 
cycling provides a much cheaper alternative than to 
restrict the Courant condition for the entire simu- 
lation. 

4. Radiative regions are defined in a uniform mesh- 
ing (independently of MPI and simulation mesh 
geometries), which gives the advantage of offering 
the possibility to sample very local volumes of the 
plasma. This may be of interest in simulations with 
— at the same time — subvolumes of very high and 
low anisotropy, such as is the case in fully resolved 
shock simulations, and relativistic streams. 

A seamless integration into the PhotonPlasma code 
has made the synthesis module computationally efficient, 
and due to the embarrassingly parallel nature of the spec- 
tra collection procedure plasma simulations have been 
run with millions of particles used for sampling the syn- 
thetic spectra. Particles are chosen for spectral integra- 
tion before or during a run by tagging for synthetic sam- 
pling. The detailed sampling is important in highly rel- 
ativistic cases with bulk flows - for example when in- 
vestigating the radiation signature from relativistic colli- 
sionlcss shocks, and streaming instabilities, more gener- 
ally 20 ' 30 - 31 . 

VII. BINARY COLLISION OPERATORS 

The classic particle-in-cell framework does not take 
into account physical collision processes, and all particle 
interactions are mediated through the electromagnetic 
fields on the mesh. Low energy electromagnetic waves, 
and large impact parameter electrostatic scatterings be- 
tween charged particles can be resolved directly on the 
grid through particle-wave interaction, but the photon 
energy is limited by the grid resolution and binary electro 



static scattering is not correctly represented. To allow for 
binary interactions, high energy photons, and in general 
interactions of neutral and charged particles and gas drag 
forces, we have to model them explicitly in cases where 
they are of importance, such as in high energy density 
plasmas, and partially ionized mediums. The Photon- 
Plasma code supports the inclusion of particle-particle 
interactions, decay of particles, and allow for neutral par- 
ticles, in particular photons, in the model. The first im- 
plementation of binary interactions — in particular Comp- 
ton scattering — together with tests of the method was 
given by Haugb0lle 32 and Hededal 20 . This first imple- 
mentation motivated the name for the code, the Pho- 
tonPlasma code. The Compton scattering module was 
used to model the interaction of a gamma-ray burst with 
a circumstellar medium 33-35 . Coulomb collisions have 
later been incorporated into the framework, to study par- 
ticle acceleration in solar active regions 36 . 



A. Compton Scattering and splitting of particles 

The classic Monte Carlo approach to scattering is 
based on a cut-off probability: first a probability for the 
process is computed and then it is compared with a ran- 
dom number. If the random number is lower than the 
threshold the scattering for the full macro particle pair 
is carried through, and otherwise nothing happens. This 
probabilistic approach is straight forward both numeri- 
cally and conceptually, but it can be noisy, in particular 
when interaction effects are strong but have low proba- 
bility. 

In the code the natural domain to consider is a single 
cell, partly because that is by definition the volume of 
a single macro particle, partly because some interactions 
(e.g. electro static interactions) are mediated by the grid 
at larger scales. In a PIC simulation typical numbers are 
10 — 10 3 particles per species per cell, and a probabilistic 
approach would result in an unacceptable level of noise. 
Consider a beam incident on a thermal population: The 
first generation of scattered particles may be computed 
relatively precise, but the spectra of later generations will 
require an excessive amount of macro particles, if they 
all represent an equal amount of physical particles, given 
the exponentially lower number density of later genera- 
tions. Another well known consequence is that the pre- 
cision scales inversely proportional to the square root of 
the number of particles. This is a problematic limitation, 
when the higher order generations are important ingre- 
dients of the physics, and the scattering process is not 
just a means to thermalizing or equilibrating the phase 
space distribution. To circumvent these problems, the 
Compton scattering module is instead based on an ex- 
plicit splitting approach for particles using the calcula- 
tion of cross sections and allowing for individual weights 
for each macro particle. To implement the scattering of 
two macro particles we transform to the rest frame of the 
target particle and compute the probability P(n) that a 



13 



Laboratory frame ^r> - 



t 



Target restframe 11 -« ^gF, ■<— ^-k ^n-k 

FIG. 5. Sketch of the scattering mechanism in the code of 
an incident macro particle (shown as blue/dark gray) on a 
target macro particle (shown as red/light gray) resulting in 
the creation of a scattered macro particle pair. 



single incident particle during a single timestep is scat- 
tered on the n target particles. If the incident macro 
particle has weight m, then k = P(n)m particles will 
interact and two new macro particles representing the 
fraction of scattered particles are created (see Fig. 5). In 
the case of Compton scattering, the target will always 
be the charged particle, while the incident is a photon, 
and the scattering amplitude is calculated with the Klein- 
Nishina formula, which covers the full energy range from 
Thompson to high energy Compton scattering. To make 
the process computationally more efficient prior proba- 
bilities can be applied. If instead of selecting all pairs 
in a given computational cell, one only selects pairs with 
a prior probability Q, then the weight of the scattered 
macro particles has to be changed to k/Q. By cleverly 
selecting the prior probability such that for example the 
Thompson regime is avoided, the computational load can 
be greatly decreased. 

Each scattering leads to the creation of a new macro 
particle pair, and if untamed, the number of macro par- 
ticles will grow exponentially. To keep the number of 
particles under control, we use particle merging in cells 
where the number of particles is above a certain thresh- 
old. The algorithm is described in detail by Haugb0lle 32 
and Hededal 20 . 

The Compton scattering module of the Photon- 
Plasma code has allowed us to approach exciting new 
topics in high energy plasma astrophysics, where plasmas 
are excited and populations modified by photons, and 
the back-reaction of the plasma on the (kinetic) photons 
produces interesting and detailed descriptions of for ex- 
ample the production of inverse Compton components 33 , 
and photon beam induced plasma filamentation 35 . 



B. Coulomb scattering 

Coulomb scattering of charged particles in an astro- 
physical context is for example important in order to 
understand the solar chromosphere and the lower parts 
of the corona, where the mean free path is similar to 
the dynamical length scales in the system. In the Pho- 
tonPlasma code we have integrated a collision model, 



where all macro particle pairs in a cell are considered 
for scattering. We calculate the scattering process in the 
elastic Rutherford regime. The collision process is im- 
plemented using a physical description for each macro 
particle pair, starting by calculating the time of closest 
approach. Only if that time is less than At/2 from the 
current time, ie. inside the current time interval, is the 
scattering carried through. In the limit of very small 
time steps this makes the algorithm independent of the 
size of the time step. When calculating the impact pa- 
rameter between two macro particles we have to take in 
to account that each particle represents a large number 
of physical particles, therefore the impact parameter is 
rescaled with the typical distance between each physical 
particle n -1 / 3 , where n is the number density. Because 
macro particles carry a variable weight, we use the geo- 
metric mean of the number density of the particle pairs 
to calculate the effective number density n = (ni?^) 1 / 2 . 

We consider three different regimes, based on the im- 
pact parameter: i) If the impact parameter is larger than 
the local Debye length we assume that Debye screening 
between the two particles is so effective that no scattering 
happens, ii) If the impact parameter is so large that the 
effective scattering angle is less than C radians (normally 
taken to be 0.2 in the code) we use a statistical approach: 
At small angles the scattering angle is inversely propor- 
tional to the impact parameter, and we can replace the 
many small angle scattering by fewer large angle scat- 
terings, comparing the ratio of the cut-off to the impact 
parameter b c /b with a random number. If it is lower the 
scattering is carried through, but using the cut-off im- 
pact parameter b c for greater computational efficiency, 
iii) If the impact parameter is smaller than the cut-off 
parameter b c then we make a detailed computation of 
the scattering. In both of the two last cases the scat- 
tering angle is calculated in the center-of-mass frame as 
an inelastic Rutherford scattering, which conservers the 
energy of each macro particle. 

The explicitly physical implementation at the macro 
particle level of Coulomb scattering is conceptually com- 
pletely different from the Compton scattering module, 
in particular because the main consequence of Coulomb 
scattering is the thermalization and isotropization of the 
plasma. 



VIII. TEST PROBLEMS 

Testing the accuracy and precision of a particle-in-cell 
code is particularly difficult, because of the non linear na- 
ture of plasma dynamics, Monte-Carlo particle sampling, 
and the few examples of realistic test problems with an- 
alytic counterparts. To facilitate cross comparison with 
other codes, below we apply the PhotonPlasma code 
to a set of classic test problems, and in some cases com- 
pare different order splines and differential operators to 
highlight the impact of using high order methods. We 
also give an example of of the more non-standard fea- 
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ture of radiative cooling. Further tests have been pub- 
lished in the case of Compton scattering 20,32 and syn- 
thetic spectra 20 . 

A. Numerical heating and collision artifacts 

Numerical heating is a well studied feature of all PIC 
codes 3 . It happens due to the interaction of particles 
with the mesh; the so-called grid-collisions. If particle 
populations with different energy distributions exist in 
the plasma, the interaction through the mesh will tend 
to equilibrate the kinetic energy of each particle species. 
The equilibration of temperatures happens in a labora- 
tory plasma too, albeit normally at a much slower rate, 
and the difference between a laboratory and the compu- 
tational plasma is the much smaller number of macro- 
particles used to represent the plasma in the latter case. 
It is also worth pointing out that the numerical mesh 
heating is an equilibration of kinetic energies, and as 
such much more severe for ion-electron plasmas. It is im- 
portant to keep this energy equilibration in mind when 
simulating plasmas with greatly varying temperatures, 
or when analyzing heating rates due to real physics. If 
the numerical heating rate is close to the physical rate 
in question, the results cannot be trusted. It is interest- 
ing to note that the heating rate is practically invariant 
with respect to the numerical technique used, and instead 
mainly depends on the number of time steps taken. 

The test is done in two dimensions, with three velocity 
components. There is no bulk velocity, and the kinetic 
energy corresponds to a thermal velocity of both ions 
and electrons of v t h.e — Vth,i = 0.1c per component, or 
££ in = 0.015 m s c 2 . The size of the box is 12.8 2 elec- 
tron skin depths with 10 cells per skin depth for a 128 2 
resolution. We perform two tests with 5 and 50 parti- 
cles per cell. The mass ratio is mi/m e = 16. We used 
TSC or cubic spline interpolation, 2 nd , 4 th , or 6 th order 
field solver, and in the case of the 6 th order field solver 
with cubic interpolation we also use the charge conserv- 
ing (CC) method for the current density with second or 
fourth order time stepping. 



B. A relativistic cold beam 

A particle-in-cell code is not Galilean invariant. When 
a cold plasma beam travels through the box at constant 
velocity the electrostatic fluctuations inside the Debye 
sphere or, if it is less than the mesh spacing, inside a 
single cell, will have resonant modes with the mesh spac- 
ing. This leads to an effective drag and redistribution 
of kinetic energy from the stream direction to the paral- 
lel direction, and general warm up of the beam. When 
the temperature reaches a critical level the instability is 
quenched. For relativistic beams there is the additional 
complication that electromagnetic waves are represented 
on the mesh. The solver has an effective dispersion rela- 



tion, and short wavelength waves travel below the speed 
of light. On the other hand, particles are Lagrangian, 
and if relativistic they can effectively travel above the 
speed of short wavelength waves, giving rise to numerical 
Cherenkov radiation. 

A classic method to limit the impact the of the cold 
beam instability is applying filters to either the current 
density or the electric field. This may to some extent 
filter out the effects, but will also filter out some of the 
physics. Alternatively, higher order field solvers, inter- 
polation techniques, and time stepping can mitigate the 
effects. To test the numerical methods used in the Pho- 
tonPlasma code, we have made a cold beam test with 9 
different versions of the code. We do not apply any filters 
to the current density, to show the actual performance of 
the different code versions. Apart from the 8 methods 
used for the numerical heating test there is also a ver- 
sion where instead of the implicit field solver a simple 
(but 6 th order) FDTD explicit solver is used. Notice how 
the explicit solver has numerical heating, while implicit 
solvers have numerical cooling 3 , and how the stability 
of the beam is greatly enhanced by the implicit solver, 
compared to using an explicit solver. But only the com- 
bination of the implicit solver with a charge conserving 
current deposition gives stable beams, with the smallest 
heating (see figure 7). It is also interesting to notice that 
the heating is non-isotropic. It is therefore not the same 
to initialize a plasma with a low but stable temperature, 
as to use a very low temperature and let the cold beam 
stability warm up the beam. 

The test is done in 2D2V with a T — 10 streaming 
pair plasma through a 256 x 256 cell domain with 10 
particles per species per cell and a relativistic skin depth 
S = [(mc 2 T) / {Aimq 2 )] 1 / 2 often cells. The initial temper- 
ature is measured in the rest frame of the plasma and has 
a root-mean-square per velocity component of 0.025c, so 
that ^rms(wth) = 0.05c. 

By rerunning at higher resolutions we have investi- 
gated at what resolution other methods give comparable 
results to the charge conserving method with fourth or- 
der time stepping, by comparing stream velocities, and 
parallel and perpendicular temperatures at uj p t = 1000 
(see table I). It is only at higher resolutions that the 
non-charge conserving methods do not suffer from the 
catastrophic instability seen in figure 7, and cost- wise the 
sixth order charge conserving method is marginally the 
cheapest. Had the test been in 3D, where the cost goes 
like resolution to the fourth power, the fourth order time 
integration method would have been the cheapest for this 
particular beam test. At high enough resolution the cold 
beam instability is quenched. For the fourth order charge 
conserving method this happens at a resolution of ap- 
proximately 3072 2 , where instead of a large heating rate, 
and then a new stable temperature, we observe a grad- 
ual heating over time. A resolution of 3072 2 corresponds 
to resolving the Debye length, = v t h/(Pc)5 e , with 
1.9 cells. Taking into account that we do not apply any 
damping, this is in good agreement with the common wis- 
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FIG. 6. Changes in ion and electron temperatures in the numerical heating experiment as a function of time for different choices 
of solvers. The two left (right) panels show the heating rate for 5 (50) particles per species per cell. Note that the heating rate 
is virtually independent of the type of solver, but is strongly dependent on the number of particles per cell. 
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FIG. 7. The top row shows the energy conservation and stream velocity in the original rest frame. The bottom row shows 
the evolution in the temperatures parallel and perpendicular (i.e. Tji and T±) to the beam direction. For reference, results are 
also included using a simple FDTD explicit solver for the electromagnetic fields. Notice that the runs have been done without 
applying any kind of filtering to the current density. 



dom that the Debye length has to be resolved by roughly 
one cell. 



C. Relativistic two-stream instability 

Relativistically counter-streaming plasmas have pre- 
viously been established to be subject to a general in- 



stability class; the oblique (or mixed-mode) two-stream- 
filamentation instability (MMI), which mixes the two- 
stream (TSI) and filamentation (FI) instabilities. A thor- 
ough and exhaustive analysis of the MMI was given by 
Bret et al 37,38 , and has been investigated numerically 
by several groups (see e.g. Tzoufras et al 39 and Dieck- 
mann et al 40 ). Due to its mixed nature, the MMI con- 
tains both an electrostatic and an electromagnetic wave 
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TABLE I. Resolution study for the cold beam instability 



Method Res Cost /is/part 



CCo 6 th order field; 4 th order time 256 2 


1 


6.60 


CCo 6 th order field 380 2 


0.9 


1.72 


Cubic interpolation; 6 th order field 512 2 


1.3 


1.05 


TSC interpolation; 2 nd order field 950 2 


4.9 


0.63 



the fastest growing MMI, FI, and TSI modes become 
7mm/ = 0.201, lF i = 0.153, 7 T5/ = 0.080. 

The computational domain is {L x ,L y ,L z } = 
{12.8<5 e , 12.8<5 e , 12.8J e }, with {N x ,N y ,N z } 
{128,128,128} cells. We use 20 particles/cell/species, 
or a total of 80 particles/cell (beam+background) . The 
physical constants are scaled as c = 1, q e = 1, m e = 1 
and mi/m e = 1836. 



component . 

Potentially, both electrostatic and electromagnetic tur- 
bulence (wavemode coupling leading to cascades/inverse 
cascades in k-space) is possible in such systems. This 
potential for producing very broadband plasma turbu- 
lence (in both E and B fields) is highly relevant to in- 
ertial confinement fusion experiments. Other examples 
where electromagnetic wave turbulence 41 and the gen- 
eral MMI are important are astrophysical jets and shocks 
from gamma-ray bursts and active galactic nuclei, where 
ambient plasma streams through a shock interface mov- 
ing at relativistic speeds. 

To test physical scenarios responsible for observational 
signatures from these astrophysical sources, and from 
plasma experiments, we must construct plausible shocked 
outflow conditions 27 ' 42,43 and then subsequently synthe- 
size radiation signatures 20 ' 30,31 to test the assumed phys- 
ical conditions against observational evidence. Studying 
the MMI, in both its linear and non-linear evolution is 
therefore well motivated. 

Growth rates of the (general) MMI, the special case of 
FI and the TSI, respectively, are calculated as in 37 : 

1mmi= 2-^ ^ T{v b y^ (51) 

Ifi = (3 a 1 ' 2 Tiv.r 1 ' 2 (52) 

Itsi = 2- 4 / 3 VS a 1 '* r^)- 1 (53) 

where j3 = v^/c is the beam velocity, a = rn,/n p is the 
beam-to-background density ratio, and T(vb) is the beam 
bulk flow Lorentz factor. For thin, high-Lorentz factor 
beams, the MMI is the fastest growing mode, dominat- 
ing over both the FI and the TSI. In the test we have 
chosen the MMI is dominant, with subdominant FI and 
TSI components. 

Wc perform six runs with identical initial conditions 
and physical scaling, using combinations of finite dif- 
ference operators and particle shape functions as given 
in Tablell below. This way, we test the Photon- 
Plasma code for differences/similarities between inter- 
polation schemes. 

To capture the MMI as the fastest growing mode, we 
initialize a simulation volume with a cold thin neutral 
beam (electrons + ions) through a warm thick neutral 
background (electrons + ions), with no fields initially. 
The beam and background densities are nj, = 0.1 and 
n p = 0.9, respectively. The beam velocity is chosen to 
have T(vb) — 4. Temperatures of the beam and back- 
ground are Tf, = 0.01 and T p — 0.1, respectively. With 
these choices of physical properties, the growth rates of 



TABLE II. Schemes order variation in the relativistic mixed- 
mode two-stream instability test case, for: finite difference op- 
erators (fields/sources), shape function (particle-mesh/mesh- 
particle interpolation), charge-conservation, time integration 
order. 



Run Fields Particles charge-conservation Time order 



1 


2 nd 


tsc 


no 


2 


2 


2 nd 


cubic 


no 


2 


3 


6 th 


tsc 


no 


2 


4 


6 th 


cubic 


no 


2 


5 


6 th 


cubic 


yes 


2 


6 


6 th 


cubic 


yes 


4 



For our choice of run parameters a MMI mode devel- 
ops with a propagtion wave vector, \<lmmi ; that is oblique 
with respect to the streaming direction, at an angle given 

by M mi = Z(k A f M /,kboam) = arctan (V w b/ U p) w 74°. 
The corresponding electric field component is almost par- 
allel to the direction of propagation 37 . 

To find growth rates of the MMI we calculate the vol- 
ume integrated electrostatic energy as a function of time 

E E ,tot((>MMi) = / \E±\sin(0 M Mi) + E\\cos(6 M mi) dV 

of the electric field projected on the propagation direc- 
tion, E(r) ■ kMMi- Similarly, we also measure the TSI 
mode growth rate (Eq. 53) by the same calculation, but 
for the TSI we have Otsi = ^(k-Tsi, z = 0). Our results 
are summarized in Table III. From figure 8 we see that 

TABLE III. Growth rates measured for the six runs listed in 
II for the two cases of the MMI and TSI. 



Run 


12 3 4 


5 


6 


theory 


7mm/ 


0.185 0.190 0.203 0.206 


0.205 


0.205 


0.201 


7T5/ 


0.060 0.079 0.065 0.082 


0.088 


0.088 


0.080 



the initial noise build-up prior to instability dominance 
is strongest (as expected) for Runl. This results in lower 
growth rates for lower order runs, since the noise tends 
to 'flatten the total energy history, with a lower jmmi as 
a result. This error in the measurement decreases with 
increasing run number (Runl^Run2^. . . ), with higher 
order runs' growth rates less susceptible to noise distor- 
tion. This is similar to what was seen in the cold beam 
tests: in a lower order integration scheme more energy 
is lost to artificial heating and Cerenkov radiation, mak- 
ing less energy available for the physical instabilities, and 
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changing the parameters (for example temperatures) of 
the plasma components, modifying the overall setup. 

Because of the lower level of noise, the onset of in- 
stability is delayed for runs of increasing order, while 
the peak energies and peak times coincide roughly for 
all runs; this effect correlates with the increased growth 
rates of the higher order runs, relatively Both of the 
effects mentioned above are caused by the difference in 
dynamic response of the various schemes. A higher order 
scheme is desirable since the noise levels, heating, nu- 
merical stopping power and dynamical friction are all re- 
duced considerably. The differences between using either 
TSC or cubic interpolation, second or sixth order field 
integration, and (no) charge conservation are all clearly 
reflected in the growth rates of the MMI and TSI. 

Nonetheless, growth rates are seen to converge with 
different methods to a value which deviates from the the- 
oretical prediction by about 2% for the MMI component, 
and ~10% for the TSI component, for the highest or- 
der schemes, and the overall development is in qualita- 
tive agreement for all test cases. Concluding this test, 
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FIG. 8. Growth of the volume-totaled electrostatic energy, for 
the electric field projected onto the direction of propagation 
of the fastest growing mode, E ■ \zmmi- Runs 1-6 are com- 
pared in the plot. Thickened line segment Runl (black) gives 
the fitting interval. Runs have decreasing initial energy for in- 
creasing run number designation (Table II). Runs 5 (orange) 
and 6 (yellow) are completely coinciding. 



we have verified the growth rate of the relativistic two- 
stream (or oblique or mixed-mode) instability, and found 
very good agreement with growth rates also when se- 
lecting the TSI branch. The slight excesses in values of 
Jt Si ([Runl, ...Run6]) is likely caused by the fact that the 
relativistic beam is perfectly grid aligned, thus subjected 
to the finite grid instability, which introduces additional 
electric field energy in the beam direction, while for lower 
orders, this is more than compensated by the overall dis- 
sipation to all electromagnetic and particle components. 



D. Radiatively cooled collisionless shocks 

The radiative cooling currently implemented is in- 
spired by the early work of Hededal; he validated it for a 
simple test case of a radiatively cooling charged particle 
in a homogeneous magnetic field 20 . For a more non triv- 
ial application we here for the first time present a series of 
simulations of radiatively cooled initially non- magnetized 
collisionless shocks. If we assume that the acceleration 
of a particle in a collisionless shock is mostly due to a 
homogeneous magnetic field B, we can estimate the syn- 
chrotron cooling time for an initial Lorentz factor 70 as 
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(54) 



Consider a relativistic collisionless shock in the con- 
tact discontinuity frame, with the upstream moving at 
a Lorentz factor T and having a number density n. The 
kinetic energy density in the upstream is 
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(r- l)n 



mc 2 



(r - l)Mc 2 n. 



(55) 



We assume that the magnetic energy density at the shock 
interface Eb = B 2 /2fiQ is related through some effi- 
ciency factor a ~ 0.1 to the upstream kinetic energy den- 
sity. The relativistic plasma frequency in the upstream 
medium is w 2 e = q n/(eQm e T). Putting it all together 
we can find the synchrotron cooling time at the shock 
interface, for a particle with a gamma factor 70, to be 
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(56) 



For a pair plasma M = 2m e , and if we consider upstream 
particles 70 = T it reduces to 
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(57) 



We use this definition to label the different runs. Our 
definition of the cooling time is similar to the one 
given in Medvedev and Spitkovsky 44 , but differs because 
they considered the downstream skin depth: T c s ^"j our = 
9 / 4r i/2 T syn m g We haye made long term 2D2V gimu _ 

lations of the shock, using reflecting boundaries and dif- 
ferent cooling times, including a run without cooling for 
reference, and with electron-ion and pair plasmas. The 
runs were done with cubic interpolation, sixth order field 
pusher and a 17-point current density filter. We used 
the sliding window technique to be able to follow the 
development of the shock up to uj pe t = 5000, and used 
more than 5 billion particles to model the shocks. In all 
cases the upstream Lorentz factor is V = 10. While be- 
low we present results from runs with 20 cells per skin 
depth, we have used both 10, 20, and 40 cells per skin 
depth and between 12 and 24 particles per species per 
cell and find converged results. Examples of the mag- 
netic field and phase space density for different cooling 
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FIG. 9. Left panel: Magnetic field density normalized to the upstream kinetic energy density es at ui pe t — 1600, shown in a 

1/2 

500 <5 e cutout near the shock interface. To enhance the dynamic range a signed e s is shown. Right panel: Corresponding 
phase space density. Notice how not only the phase space is quenched by the cooling, but also the magnetic field density at the 
shock interface, and downstream of the shock it depends on the cooling rate. The velocity range is different for the different 
cases. 



times can be seen in figure 9. The shocks can phenomeno- 
logically be categorized into three types: shocks in the 
strong cooling regime, radiative shocks, and weakly ra- 
diative shocks. In the strong cooling regime, there are 
no traces of a power law tail of accelerated particles, and 
both the evolution, shock jump conditions, and the micro 
structure itself are qualitatively different from a normal 
non-radiating shock. In particular, the upstream is com- 
pletely unperturbed by the existence of the shock. This 
is found for cooling times T coo i < 200. In a strongly ra- 
diating shock the micro structure is disturbed, and the 
downstream magnetic islands only survive very close to 
the shock interface, but the upstream does have resem- 
blance to a normal collisionless shock. The power law tail 
of accelerated particles is completely gone, and the shock 
jump conditions are altered. This is found for cooling 
times 200 < T CO oi < 1000. The weakly radiative shocks 
are similar in structure to a non-radiating shock, but 
with mildly perturbed shock jump conditions and prop- 
agation velocity (see figure 10). The power law index 
for the high energy population of accelerated electrons 
is steeper than for a non-radiating shock, with a depen- 
dence on the cooling time. The magnetic energy density 
decreases significantly approximately 150 S e away from 
the shock interface, but the time a high energy particle 
spends that close to the shock transition, where the bulk 
of the cooling happens, is a stochastic function of its an- 
gle to the shock normal, and the number of scatterings 
on fluctuations in the electromagnetic field. In principle, 
if we simulate for a long enough time, with a long cool- 
ing time and with high statistics, the powerlaw tail of 
accelerated particles will grow over more than a decade 
in energy, and a well defined cooling break should emerge 
with two different slopes clearly visible. In practice, given 
the tangled nature and stochastic propagation, the break 



will most probably be smooth, significantly smeared out 
around the characteristic energy, where electrons start to 
be efficiently cooled. In these exploratory simulations the 
emergence of a cooling break does not occur. Instead, in 
the case of weakly radiative shocks, the high-energy part 
of the particle distribution (PDF) is a powerlaw with an 
exponential cut-off, but with a steeper powerlaw index 
than in the non-radiating case (see fig. 11). It is well 
known that in PIC simulations of non-radiating shocks 
the upstream region affected by high energy particles pro- 
duced at the shock interface only grows with time, and 
it has been an open question what the long term struc- 
ture looks like 45 . This is different for radiatively cooled 
shocks, where at large times the shock settles down to 
a quasi-steady state, making it possible to draw con- 
clusions about the long term behavior. The extent of 
the upstream is limited, and the powerlaw part of the 
PDF does not grow in time. Any collisionless shock is 
radiatively cooling, given long enough time, and from 
our simulations it is clear that the impact of cooling for 
weakly radiating shocks is greater than speculated in e.g. 
Medvedev & Spitkovsky 44 , where analytic estimates were 
used. Given the possibility of collisionless shocks to medi- 
ate secondary instabilities such as the Bell instability far 
upstream, by the generation of streaming cosmic rays 46 , 
it would be interesting to understand the impact on col- 
lisionless shock for the very large cooling times expected 
in e.g. GRB afterglows. We speculate that using progres- 
sively larger cooling times in sufficiently large simulations 
could give insight in how to scale the solution to arbitrar- 
ily long cooling times. 
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FIG. 10. Top panel: Density ratio between up and down 
stream at oj pe t — 1600, as a function of cooling time. Bot- 
tom panel: Effective adiabatic index, derived from the shock 
velocity. The right most point is for a run without radiative 
cooling. 
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FIG. 11. The particle distribution function sampled in the 
downstream region for the case of no cooling and T coo i = 
25600, firmly in the weakly radiative regime. The power law 
index is indicated with the dashed line, and given in the leg- 
end. 



E. The linear magnetized Kelvin-Helmholtz instability 

Earth's magnetopause constitutes an important region 
in space. It is the boundary layer, separating the Earth's 
magnetosphere from the solar wind. In this region, the 
Kelvin-Helmholtz instability (KHI) is driven by veloc- 
ity shears between the magnetosheath and the magne- 
tospheric plasma at low latitude (close to the magnetic 
equator). We have selected the problem of the linear 
magnetized KHI to compare with results using our local 
MHD Stagger code. Essentially, we use the setup for 
the SWIFF Magnetopause Challenge 47 code comparison, 
except that the amplitude is slightly different and there 
are half the number of skin depths in the kinetic case. 



The MHD and PIC code setups are nearly identical, the 
only difference being the addition of microphysical pa- 
rameters, and that in the PIC case we must ensure that 
the initial condition is a kinetic equilibrium respecting 
constraints like Gauss law. 

We compare our PIC results against those obtained 
with the Stagger code, also developed and maintained 
at the Niels Bohr InstituteNordlund and Galsgaard 13 . 
This finite difference mesh based MHD code is a fully 
3D resistive and compressible MHD code. The MHD 
variables are located on staggered meshes, and the dis- 
cretization is very similar to the PhotonPlasma code, 
with sixth order spatial derivatives, and fifth order inter- 
polation of variables. The time integration of the MHD 
equations is performed using an explicit 3rd order low 
storage Runge-Kutta method 48 . 

The experiment is a periodic 2D3V setup, and the box 
size is L x — 907r, L y — 30tt. The initial velocity field 
V = V y (x)e y contains a periodic double shear layer, to 
avoid boundary effects, with a velocity amplitude A eq = 
1. The sheared velocity jumps are located at L x /4 and 
3/4 L X1 and the transition width is a — 3. With these 
parameters, the initial velocity profiles is defined as: 



V y (x) = tanh 



L x /4 



tanh 



3/4L x 



Initially, J eq = E = 0, B — Bosin(8)e y + Bocos(8)e z , 
with Bq — 1, — 0.05. The density, pressure and Alfven 
velocity are all unity: p = P = Va = 1- For the MHD we 
use an adiabatic equation of state with 7 gas = 5/3, and 
impose a small perturbation in the velocity field to seed 
the KHI <5V = e z x Vip, where 

Ny/A 

4' = ef{x) 2J cos(2Trmy / L y + (f> m )/m , 

m— 1 

and 

f(x) = exp 



x - L x /A 



-exp 



io- 



x - 3/4L Q 



are random 



and e is such that maxd^yl) 
phases. 

In the MHD case we resolve the box with N x — 1536, 
N y = 512 cells. The PIC setup was prepared with phys- 
ical initial conditions essentially identical to the MHD 
setup, using the technique of section VD to reach an 
approximate kinetic equilibrium. The objective was to 
produce the ion-scale KHI, while resolving the electron 
skin-depth S e and having well separated inertial scales 
with rrii/m e = 64. We use a setup similar to what was 
used for a hybrid code in the SWIFF comparison 47 , but 
limit the number of ion skin depths to 457T x 157T, and 
the resolution to N x = 6144, N y = 2048. To have a rea- 
sonable plasma frequency, we rescale the speed of light 
to c = 10, and use <5 e = 6Ax, and 20 particles per cell 
per species with a total of roughly 500 million particles 
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in the box. Our choice of the shearing jump amplitude 
(Vo = ±1.0) and width (a = 3.0) selects a fastest grow- 
ing mode (FGM) leading to production of two vortices, 
which pair up and merge during the early and late non- 
linear stages of the KHI. Figure 12 shows the vortices 
just prior to, and well after the vortex merging in the 
PIC model. 

The growth rates for the two extremum cases of trans- 
verse (B ± v ) and parallel (B || v ) configurations 
were investigated theoretically and further calculated by 
Miura 51 ' 52 . We may assume that for a weak parallel com- 
ponent, i.e. Bq z -C Bo y (Bq z — 0.05Bo y ), the instability 
evolves almost as the ideal transverse case to a good ap- 
proximation. It was also determine d that the stability 
criterion for V — 2 is Mf = Vo/y/v\ + c 2 s < 2. Here, 
Mf is the fast mode magneto-acoustic Mach number. 

For our setup Mf = >/2, with both v\ = l,c 2 s = 1, 
using Vb = 2 and a = 3. From Miura 1982 51 , we find 
a growth rate for our specific setup to be 2a"fKH/Vo ~ 
0.162, which leads to the final result of 

1kh,fgm ~ 0.055 ± 0.002. 

To measure the KHI rate growth — in both the MHD 
and PIC cases — we calculate the quantity 



N v -1 



^ ] V x (xi,yj,t n )e 



V2ix 



(58) 

i.e. the power spectra along the y-axis of the x-component 
of velocity, V x , at constant x*. This is then averaged in 
the x-direction 



1 

Q(tn) — ^ ' Ur^i; ky, t n ) 

Xhi X/o I 



(59) 



for two different sets of {xi ,Xhi}, namely i) centered on 
one shearing layer, 5A X wide, and ii) across the entire 
half-volume in the x-direction. We then make a fit to 
exponential growth to obtain jki (mode) for the FGM. 



TABLE IV. Growth rate, Jkh,fgm for the magnetized KHI; 
comparison between identical runs with the PhotonPlasma 
code and the Stagger code. 





PIC 


MHD 


Right 


0.058 


0.058 


Total 


0.050 


0.056 



We find that growth rates from both the MHD and PIC 
simulations agree well with theory, to within about 10% 
(see table IV) . Averaging over all layers in the simulation 
half-plane yields a lower growth rate with the MHD in 
closest agreement with theory. This is expected, since 
averaging over only 5 layers does not capture the entire 
width of the shearing layer which is W 'shear ~ 50 A^. For 
the PIC growth rate results discrepancies in physics and 



uncertainties in data fitting, due to mode coupling are 
higher than for the MHD case; and growth rates therefore 
differ by as much as 10% in the in the volume-averaged 
case. The explanation for a slightly lower growth rate 
in the total volume averaged PIC case may be due to 
enhanced dissipation and intrinsic noise properties of the 
PIC code, or the development of secondary instabilities. 

Concluding this test, we emphasize that a PIC code 
has been used to run a fully MHD problem in PIC ex- 
plicit mode, with almost identical growth rates and vor- 
tex structures in the linear phase. For the vortex-merging 
epoch of the PIC run, results deviate qualitatively and 
quantitatively from MHD, due to kinetic effects and sec- 
ondary tearing-like instabilities developing during the 
merging stage. 



IX. PARALLELIZATION, 
PERFORMANCE 



SCALING AND 



Modern 3D Particle-in-Cell experiments in astro- 
physics use billions of particles to simulate the macro- 
scopic structure of plasmas. To run these simulations the 
code has to be massively parallel. The PhotonPlasma 
code started with a simple domain decomposition along 
one axis, using MPI. Later it was changed to support a 
3D MPI decomposition, and in the current version we 
use a hybrid parallelization with OpenMP and MPI to 
scale the code effectively up to hundreds of thousands 
of cores (with 262.144 cores being the largest case tried 
so far). The OpenMP hybrid approach is also necessary 
when running the GPU version of the code, because nor- 
mally the ratio of CPU cores to GPUs in a GPU system 
is larger than one. 



A. MPI 

The MPI parallelization has been designed to be as 
transparent as possible, and consists of different modules, 
all collected in a single file. This makes it easy to compile 
the PhotonPlasma code both with and without the 
MPI library. 

• Particles: After moving the particles on a node, 
and applying the physical boundary conditions, on 
each thread we check sequentially in the x-, y-, 
and z-direction which particles have moved to other 
threads, and interchange particles accordingly. A 
reverse transversion of the linked list structure con- 
taining the send particles makes it efficient to store 
the received particles in the same slots. 

• Fields: When the physical boundary conditions are 
applied for the fields, ghostzones are also exchanged 
between threads. 

• I/O: The snapshot and restart file format is bi- 
nary, and the code uses MPI-IO. For the particles 



21 




20 40 60 80 100 120 140 



80 
60 
40 
20 




t=35 




20 40 60 80 100 120 140 



FIG. 12. Early non-linear (left) and late non-linear vortex paring (right panel) stages of a PIC code model of the Kelvin- 
Helmholtz instability. The small scale electron density waves barely visible in the left panel are probably because of the initial 
condition not being a perfect kinetic equilibrium 49 ' 50 . 



each attribute (ie. the ^-position) is stored in a sin- 
gle block, and the IO-types are all 64-bit, making 
it possible to store billions of particles in a sin- 
gle snapshot with GB/s of performance. The code 
stores everything in a few files, and restarts can be 
done on an arbitrary number of threads. 

• Synthetic Spectra: The spectra are sampled on sub- 
volumes of the data, and each thread only allocates 
data for the sub-volumes that intersect with the 
local domain. This makes the mechanism scalable 
to 0(1O 5 ) sub-volumes, without wasting memory. 
The lowest ranked thread for a given sub- volume 
writes the data, making the I/O scalable too. 

• Particle Tracing: Particle tracing is done locally on 
each thread, while only the master thread writes 
the I/O. This is a potential bottleneck, and on sys- 
tems with relatively weak CPUs, e.g. Blue-Gene/P 
we are limited to tracing ~1 million particles, while 
for x86-based clusters with global Lustre filesys- 
tems we can effectively trace up to ^10 million par- 
ticles, without significant slowdowns. 

• 2D-slices: The code can extract (averaged) 2D 
slices on-the-fly while running. While this is very 
convenient for movie making, for large runs thread 
contention becomes a problem when reducing the 
data on a single thread: When thousands of threads 
try to communicate with a single thread, the slow 
down can be significant. We use single-sided MPI 
to effectively circumvent the thread contention, 
making slice writing scalable to at least 100.000 
MPI threads on network transports that support 
RDM A. 

A major problem with simple domain decomposed 
particle-in-cell codes is that systematic local over- 
densities of particles easily occur, for example in colli- 
sionless shock or laser wake-field simulations. This leads 
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FIG. 13. Density profile of a 2D collisionless shock. The 
dashed lines indicate the z-boundaries of MPI domains. They 
are updated dynamically to equalize the load. 



to severe load imbalance. To work around this the Pho- 
tonPlasma code has a simple dynamic load-balance fea- 
ture. If enabled, mesh-slices along the z-direction can be 
exchanged in-between neighboring threads, according to 
a user defined cost formula. In the current version parti- 
cles and cells are assigned a certain cost, and this makes it 
possible to dynamically sample local spikes in the parti- 
cle density (see Figure 13). When a slice of cells is moved 
from a thread to the neighbor thread the corresponding 
field values, particle data, synthetic spectra sub- volumes, 
and local random number generators are updated. The 
same mechanism is used to make an optimal distribution 
of cells when restarting a run. We have tested the MPI 
performance in a simple, but realistic setup, with two 
counter streaming plasma beams on the Blue-Gene/P 
machine JUGENE running in pure MPI mode. Figure 
14 shows weak scaling behavior of the code from 8 to 
262.144 cores. This was done a few years ago, with a 
version of the code without charge conservation, and the 
scaling of the current code is significantly better. 
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FIG. 14. Weak scaling in pure MPI mode when running on 
the BG/P machine JUGENE. The setup is a relativistic two- 
stream experiment with periodic boundaries and a T = 10 
streaming motion. There are roughly 80 particles per cell 
with 16 3 cells per MPI domain. This experiment is not using 
the charge conserving current deposition, and the scaling ol 
the current code is significantly better. 



B. OpenMP 

The 3D MPI domain decomposition has served well, 
scaling the code, for most applications, to 10.000 cores 
on x86 clusters and more than 100.000 cores on Blue- 
Gene/P. To scale the code to a million threads or more 
on for example Blue-Gene/Q, one must introduce a new 
layer of parallelization. We have used OpenMP for a 
number of reasons: It is relatively easy to get almost 
perfect scaling for a low number of cores, it can be done 
incrementally, and it is a natural thing to do for the GPU 
version of the PhotonPlasma code. Furthermore, us- 
ing a hybrid parallelization, the size of each MPI domain 
becomes bigger, and the particle load-balance require- 
ments, due to fluctuations and variations in number den- 
sity, become smaller. This is important for experiments 
with some level of density fluctuations, even if the scaling 
per se is good in pure MPI mode. In the current version 
the most CPU consuming parts of the code have been 
OpenMP parallelized: 

• The Mover / Charge deposition is trivially paral- 
lelized by allocating one set of charge density fields 
per thread, and parallelizing the update of particles 
on a per cell basis. 

• The field solver consist of simple differential op- 
erators, and we use loop-based parallelization for 
perfect speedup. 

• The sorting of particles is more challenging: We 
first parallelize on the number of species, this is 
embarrassing parallel. Then on a nested level we 
first partition the particle data with nested parallel 
sweeps in 128 sets, and afterwards use quick-sort to 
sort each set of particles in parallel. 

• Sending particles between threads: We first paral- 
lelize on the number of species, this is embarrass- 
ing parallel. Then on a nested level we use loop 



parallelization to select particles to be send, and 
multiplex MPI communication with OMP sections. 

• Several other auxiliary routines have been paral- 
lelized and made thread safe: The random gen- 
erators, synthetic spectra sampling, initial- and 
boundary conditions. 

The rest of the code, mostly diagnostics and I/O, can 
be incrementally OpenMP parallelized as needed. We 
find excellent scaling inside x86 nodes, with optimal per- 
formance at 4 MPI threads per socket / 8 per node, 
only loosing about 5% when using 1 MPI thread per 
socket. The 5% is easily regained in higher MPI effi- 
ciency and better load-balance when running on a large 
number of nodes (top panel in Fig. 15). We also find 
that we gain roughly 15% in performance by running 
with hyperthreading (32 compared to 16 threads in bot- 
tom panel in Fig. 15). Finally, on Blue-Gene/Q it is 
absolutely crucial to use all 64 hardware threads on a 16 
core node. It improves overall performance by a factor of 
2. But to effectively use the massive amount of threads 
exposed in the system, we need OpenMP, since other- 
wise the domain size is too small. Currently we use 4 or 
16 OpenMP threads per node, depending on the scale of 
the problem. Even though we loose 16% performance by 
going from 4 to 16 OpenMP threads, for runs with e.g. 
particle tracing enabled or other IO done only through 
the master thread, and for very large runs, where load 
imbalance can be more than 20%, it is advantageous to 
use 16 OpenMP threads per MPI thread. In simpler and 
smaller scale runs we use 4 OpenMP threads per MPI 
thread. 



X. CONCLUDING REMARKS 

In this paper we present the PhotonPlasma code 
particle-in-cell code, with new numerical methods, phys- 
ical extensions, and parallelization techniques. Originally 
the main motivation for developing a new particle-in-cell 
code from scratch was to implement a modern, modular- 
ized and extendible code, with modern numerical tech- 
niques. In the paper we have demonstrated how the ex- 
tension of the classical PIC framework to higher order 
spatial and temporal derivatives, together with a novel 
charge conservation scheme, gives a much higher accu- 
racy and better stability than traditional, second order 
PIC codes, when using the same number of grid points. 
Conversely, fewer mesh points are needed to reach a given 
level of fidelity in realistic three dimensional kinetic as- 
trophysical setups, which leads to large savings in compu- 
tational costs. The excellent scalability of the code, and 
the ability to do diagnostics on-the-fly, makes it possible 
to achieve a high scientific turnaround, where numeri- 
cal experiments with more than 100 billion particles can 
be performed in a day, with dedicated use of petascale 
computational resources. 
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FIG. 15. Weak scaling in hybrid mode when running on a 16 core E5-2670 2.6GHz Sandy Bridge node and the BG/Q machine 
JUQUEEN. The setup is a relativistic two-stream experiment with periodic boundaries and a T = 10 streaming motion. There 
are 60 particles per cell. This experiment is using the charge conserving current deposition. The label gives number of OpenMP 
threads times MPI threads, and on both machines we take advantage of hyperthreading, while performance is measured per 
physical core. The left panel shows weak scaling for a fixed domain size of 16 3 cells per thread. The middle panel shows weak 
scaling on 64 nodes or 4096 threads on JUQUEEN with only 12 3 cells per thread. The right panel shows strong scaling keeping 
the domain constant at 50 cells. 



Among the novel physical extensions of the Photon- 
Plasma code code are the description of binary interac- 
tions by particle splitting, the ability to include radiative 
cooling in a self-consistent manner, the possibility of em- 
bedding the kinetic model in an MHD snapshot, and the 
application of a sliding simulation window. Other exten- 
sions are in development or are planned for the future, 
among them nested grids, proper inclusion of neutral par- 
ticles and dust, self gravity, smooth transition from the 
kinetic to the MHD regime, and simpler binary collision 
drag descriptions, relevant for studying fractionation pro- 
cesses. 

The PhotonPlasma code has already been in full 
production mode for some years, the user base is growing 
steadily, and we expect that it will be used extensively 
at the Niels Bohr Institute and elsewhere for many years 
to come. Hopefully the detailed description of the algo- 
rithms documented in this paper will also be of use to 
others, when developing particle-in-cell techniques and 
applying them astrophysical plasma physics problems in 
the future. 
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